Skip to content

pmultiqc service update example for FragPipe#564

Merged
ypriverol merged 2 commits intomainfrom
dev
Jan 19, 2026
Merged

pmultiqc service update example for FragPipe#564
ypriverol merged 2 commits intomainfrom
dev

Conversation

@ypriverol
Copy link
Member

@ypriverol ypriverol commented Jan 19, 2026

Pull Request

Description

Brief description of the changes made in this PR.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Performance improvement
  • Code refactoring
  • Test addition/update
  • Updates to the dependencies has been done.

Summary by CodeRabbit

  • New Features

    • Added support for extracting and processing RAR archives alongside existing archive formats.
  • Updates

    • Updated featured example dataset to showcase a comprehensive FragPipe DDA analysis workflow.

✏️ Tip: You can customize this high-level summary in your review settings.

claude and others added 2 commits January 19, 2026 09:39
- Add rarfile dependency to pmultiqc_service for extracting RAR archives
- Update download_pride_file to handle .rar file extraction
- Update filter_search_files to include RAR archives in search results
- Update FragPipe example in UI from PXD066146 to PXD062399 (DDA.rar)
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 19, 2026

📝 Walkthrough

Walkthrough

The changes introduce RAR archive extraction support to the MultiQC service by adding the rarfile dependency and implementing RAR detection and extraction logic in the PRIDE download workflow. Additionally, example dataset entries in the UI are updated with new accession numbers.

Changes

Cohort / File(s) Summary
RAR Archive Support
pmultiqc_service/app.py, pmultiqc_service/requirements.txt
Adds rarfile module import and dependency; implements RAR file detection in search results filtering and full extraction handling in PRIDE download flow (detect, extract to target directory, remove original); includes logging updates for RAR extraction steps.
UI Example Updates
pmultiqc_service/templates/index.html
Updates FragPipe example entry from accession PXD066146 to PXD062399; refreshes example description and button handlers to reference new dataset.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • Improving FragPipe #544: Modifies PRIDE processing/extraction flow in app.py alongside this RAR extraction addition; adds FragPipe grouping and group-based processing logic.

Suggested labels

Review effort 3/5

Poem

🐰 A rabbit hops through archives today,
RAR files unwrapped without delay,
With rarfile's help, the data flows free,
New examples shine for all to see! 📦✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the primary change: updating the FragPipe example in the UI from PXD066146 to PXD062399, which is the most user-facing change in the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@pmultiqc_service/app.py`:
- Around line 962-971: The RAR extraction branch (the elif checking
filename.lower().endswith(".rar") that opens rarfile.RarFile) lacks the safety
checks implemented in validate_and_extract_zip; update this branch to validate
each archive entry for path traversal (reject entries with absolute paths or
'..' components), enforce a max file count limit and an archive-bomb
compression-ratio check when extracting entries, and catch rarfile.BadRarFile
and other extraction errors to log and fail gracefully; preserve behavior of
removing the original .rar and setting final_file_path to download_dir only on
successful, validated extraction.

In `@pmultiqc_service/requirements.txt`:
- Around line 15-16: The requirements list includes the Python wrapper package
"rarfile" which depends on an external RAR extractor
(unrar/unar/7z/p7zip/bsdtar); update your deployment configuration (e.g.,
Dockerfile or provisioning scripts) to install one of these system tools
(install package like unrar or p7zip-full) and add a note to the project
README/deployment docs describing this runtime dependency, and also pin
"rarfile" to a minimum compatible version in requirements.txt (e.g., change
"rarfile" to a bounded spec like "rarfile>=X.Y") to match the existing version
pinning practice.

Comment on lines +962 to +971
elif filename.lower().endswith(".rar"):
logger.info(f"Extracting RAR file: {filename}")
with rarfile.RarFile(file_path, "r") as rar_ref:
# Extract to the same directory
rar_ref.extractall(download_dir)
# Remove the rar file
os.remove(file_path)
# Return the directory path since rar files contain multiple files
final_file_path = download_dir
logger.info(f"Extracted {filename} to directory")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add security validation for RAR extraction similar to ZIP validation.

The RAR extraction lacks the security checks that validate_and_extract_zip (lines 1691-1740) implements for user uploads:

  • Path traversal validation on entry names
  • Archive bomb detection (compression ratio check)
  • File count limits

While PRIDE datasets are presumably trusted, defense-in-depth is recommended. Consider also adding error handling for rarfile.BadRarFile exceptions.

🔒 Suggested security-hardened implementation
         elif filename.lower().endswith(".rar"):
             logger.info(f"Extracting RAR file: {filename}")
-            with rarfile.RarFile(file_path, "r") as rar_ref:
-                # Extract to the same directory
-                rar_ref.extractall(download_dir)
-            # Remove the rar file
-            os.remove(file_path)
-            # Return the directory path since rar files contain multiple files
-            final_file_path = download_dir
-            logger.info(f"Extracted {filename} to directory")
+            try:
+                with rarfile.RarFile(file_path, "r") as rar_ref:
+                    # Security: Check for path traversal in RAR entry names
+                    for entry in rar_ref.infolist():
+                        if ".." in entry.filename or entry.filename.startswith("/"):
+                            raise Exception(f"RAR file contains invalid path: {entry.filename}")
+                    # Extract to the same directory
+                    rar_ref.extractall(download_dir)
+                # Remove the rar file
+                os.remove(file_path)
+                # Return the directory path since rar files contain multiple files
+                final_file_path = download_dir
+                logger.info(f"Extracted {filename} to directory")
+            except rarfile.BadRarFile as e:
+                logger.error(f"Invalid RAR file {filename}: {e}")
+                raise Exception(f"Invalid RAR file: {filename}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
elif filename.lower().endswith(".rar"):
logger.info(f"Extracting RAR file: {filename}")
with rarfile.RarFile(file_path, "r") as rar_ref:
# Extract to the same directory
rar_ref.extractall(download_dir)
# Remove the rar file
os.remove(file_path)
# Return the directory path since rar files contain multiple files
final_file_path = download_dir
logger.info(f"Extracted {filename} to directory")
elif filename.lower().endswith(".rar"):
logger.info(f"Extracting RAR file: {filename}")
try:
with rarfile.RarFile(file_path, "r") as rar_ref:
# Security: Check for path traversal in RAR entry names
for entry in rar_ref.infolist():
if ".." in entry.filename or entry.filename.startswith("/"):
raise Exception(f"RAR file contains invalid path: {entry.filename}")
# Extract to the same directory
rar_ref.extractall(download_dir)
# Remove the rar file
os.remove(file_path)
# Return the directory path since rar files contain multiple files
final_file_path = download_dir
logger.info(f"Extracted {filename} to directory")
except rarfile.BadRarFile as e:
logger.error(f"Invalid RAR file {filename}: {e}")
raise Exception(f"Invalid RAR file: {filename}")
🤖 Prompt for AI Agents
In `@pmultiqc_service/app.py` around lines 962 - 971, The RAR extraction branch
(the elif checking filename.lower().endswith(".rar") that opens rarfile.RarFile)
lacks the safety checks implemented in validate_and_extract_zip; update this
branch to validate each archive entry for path traversal (reject entries with
absolute paths or '..' components), enforce a max file count limit and an
archive-bomb compression-ratio check when extracting entries, and catch
rarfile.BadRarFile and other extraction errors to log and fail gracefully;
preserve behavior of removing the original .rar and setting final_file_path to
download_dir only on successful, validated extraction.

Comment on lines +15 to +16
tuspyserver>=4.2.0
rarfile No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if rarfile is documented in Dockerfile or deployment configs
fd -e Dockerfile -e dockerfile -e docker-compose -e docker-compose.yml -e "*.yaml" -e "*.yml" | head -20

Repository: bigbio/pmultiqc

Length of output: 41


🌐 Web query:

Python rarfile package system requirements external dependencies

💡 Result:

Short summary — rarfile system requirements and external dependencies:

  • Python: requires Python 3.6+ (use 3.8+ recommended). [1][3]
  • External decompression programs (rarfile calls an external tool to extract compressed files):
    • Preferred: unrar (official non‑free UnRAR binary) — full RAR3/RAR5 support. [1][3]
    • Alternatives: unar, 7z/p7zip (may need p7zip-rar), bsdtar/libarchive (limited RAR support). Behavior/feature support varies by backend. [1][3]
  • Crypto for password‑protected headers: optional Python crypto libraries such as cryptography or PyCrypto/PyCryptodome. [2]
  • Platform notes: some backends are unavailable or limited on certain OSes (e.g., unar stdout support on Windows is poor; bsdtar has limited RAR feature support). See backend table for details. [1]

Sources: rarfile documentation and project page; Debian package FAQ. [1][2][3]


Add external RAR extraction tool to deployment configuration.

The rarfile package is a Python wrapper that requires an external RAR extraction program to function (unrar, unar, 7z/p7zip, or bsdtar). These tools must be installed in the deployment environment—for example, in the Docker image if used for containerized deployment.

Ensure the appropriate tool is installed in your deployment configuration and document this dependency. Additionally, consider pinning a minimum version for rarfile to match the version pinning practice used for other dependencies in this file.

🤖 Prompt for AI Agents
In `@pmultiqc_service/requirements.txt` around lines 15 - 16, The requirements
list includes the Python wrapper package "rarfile" which depends on an external
RAR extractor (unrar/unar/7z/p7zip/bsdtar); update your deployment configuration
(e.g., Dockerfile or provisioning scripts) to install one of these system tools
(install package like unrar or p7zip-full) and add a note to the project
README/deployment docs describing this runtime dependency, and also pin
"rarfile" to a minimum compatible version in requirements.txt (e.g., change
"rarfile" to a bounded spec like "rarfile>=X.Y") to match the existing version
pinning practice.

@ypriverol ypriverol merged commit 698ec29 into main Jan 19, 2026
23 checks passed
@coderabbitai coderabbitai bot mentioned this pull request Jan 19, 2026
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants